Food insecurity is one of the most significant environmental justice challenges in the United States, with more than 42 million people facing hunger in the US. Approximately 10.5% of US households experience some form of food insecurity. Hunger in America has been exacerbated by the COVID-19 pandemic, impacting families already facing hunger the most. Before the pandemic, more than 12 million children lived in food-insecure households, with that number now increasing to 13 million. BIPOC communities face the highest rates of starvation and hunger in the nation. 11.5% of people identify as food insecure within the Bay Area, with only 38% of them qualifying for food stamps. There are many ways to quantify food insecurity but easy access to supermarkets is what we will be focusing on.
The USDA has developed a food access database that presents data by census tract for measures of supermarket accessibility. We aim to compare Alameda County, one of the areas facing greatest food insecurity in the Bay, with San Francisco County. Both are equally urban and densely populated areas but have drastically different food health and food access issues.
Is there a statistical correlation between race, SNAP eligibility, and food access? What is the relationship between race and SNAP eligibility? What is the relationship between food access and income? What is the relationship between cardiovascular health and income level? Through those questions, we will draw conclusions between race, SNAP, health metrics, and income. We chose SNAP because it sits at the intersection of food and income in a single variable. Also, we acknowledge that this is not an exclusively urban problem (there is much evidence of food insecurity in rural areas), however, the urban setting exacerbates a lot of the issues detailed above.
(grouped by county eligibility)
We found that Alameda County, Santa Clara, Contra Cost, and San Francisco have the highest number of qualifying households in the Bay Area. Moving on to our equity analysis, we will choose to narrow down to just Alameda County and San Francisco county because of their shared urban density and their differing food health and food access issues which may make them the most interesting to compare.
Compared the totals, the proportion of white people qualifying for SNAP decreased in both counties, the proportion of Black or African American increased in both counties. In San Francisco, the proportion of Asian people qualifying for SNAP increased slightly, whereas in Alameda county it decreased significantly. Some other race alone, native Hawaiian, American Indian and Alaska Native alone, and two or more races increased in both counties. This is not suprising, and the breakdown follows national trends (proportion of white being greatest, then Black/African American, then Hispanic and Asian). Due to our findings, we will be using Black or African American as our focus racial group from now on (health effects only). Though our results would more likely be different if we included ethnicity, for the purpose of this analysis, we will just be concentrating on race.
(by PUMAs)
Let’s return to ACS data and compare four different variables in the Bay Area at the tract level: building type, SNAP allocation by household, tenure (owned or rented) and income.
To do so, we created a new binary variable, named allocated, in which income is necessarily below 66k/yr and the household was allocated SNAP benefits. This allows us to control for income which is essential, given that is the main criteria for SNAP eligibility, and we are interested in investigating additional explanatory power besides income.
Small disclaimer: Based on common sense which we have learnt not to trust, we expect some of these variables would be naturally correlated––for example, home ownership below a certain income bracket is very uncommon, thus we could correlate income to home ownership.
Our results for our logit model are below.
##
## Call:
## glm(formula = allocated ~ building + tenure + kitchen + puma,
## family = quasibinomial(), data = bay_pums_factored)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6194 -0.2748 -0.2189 -0.1757 3.4737
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.16970 0.58825 -3.688 0.000228 ***
## building2 -1.02120 0.47519 -2.149 0.031678 *
## building3 -1.23964 0.56015 -2.213 0.026936 *
## building4 -0.64073 0.55375 -1.157 0.247293
## building5 -1.20333 0.55903 -2.153 0.031401 *
## building6 -1.06045 0.56704 -1.870 0.061520 .
## building7 -1.44002 0.61826 -2.329 0.019889 *
## building8 -1.43931 0.59055 -2.437 0.014833 *
## building9 -1.35707 0.55071 -2.464 0.013763 *
## building10 -15.49089 2904.25614 -0.005 0.995744
## tenure2 -0.89760 0.29275 -3.066 0.002179 **
## tenure3 -0.42106 0.24000 -1.754 0.079414 .
## tenure4 0.32974 0.43785 0.753 0.451431
## kitchen2 -1.59955 1.03580 -1.544 0.122584
## puma00102 -0.01476 0.42568 -0.035 0.972349
## puma00103 -0.07534 0.56326 -0.134 0.893593
## puma00104 0.35436 0.40276 0.880 0.378989
## puma00105 -0.30853 0.48853 -0.632 0.527711
## puma00106 -2.08315 1.06081 -1.964 0.049612 *
## puma00107 0.40156 0.42194 0.952 0.341290
## puma00108 0.03648 0.51598 0.071 0.943641
## puma00109 0.36878 0.45594 0.809 0.418648
## puma00110 0.10049 0.48883 0.206 0.837137
## puma07501 1.25659 0.40314 3.117 0.001837 **
## puma07502 0.93628 0.45753 2.046 0.040767 *
## puma07503 0.36615 0.51376 0.713 0.476062
## puma07504 -14.88140 467.16184 -0.032 0.974589
## puma07505 -0.46167 0.68132 -0.678 0.498054
## puma07506 -1.74479 1.06334 -1.641 0.100886
## puma07507 0.01798 0.49362 0.036 0.970946
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 1.012113)
##
## Null deviance: 1377.3 on 5387 degrees of freedom
## Residual deviance: 1294.0 on 5358 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 17
Data Dictionary:
Here are the meanings of the outcomes for each of the factors of our model.
Tenure
1.Owned with mortgage or loan (include home equity loans)
2.Owned free and clear
3.Rented
4.Occupied without payment of rent
Kitchen
Complete kitchen facilities
1.Yes, has stove or range, refrigerator, and sink with a faucet
2.No
Building
Units in structure
01.Mobile home or trailer
02.One-family house detached
03.One-family house attached
04.2 Apartments
05.3-4 Apartments
06.5-9 Apartments
07.10-19 Apartments
08.20-49 Apartments
09.50 or more apartments
10.Boat, RV, van, etc.
Results from logit model:
Building Type: there is a statistically significant correlation between building types 2, 3, 5, 7, 8 and 9 with SNAP Allocation + Income. All of these have negative estimates meaning these variables would have a negative effect on the dependent variable. In other words, it is a decrease in probability. Thus, we can conclude there is no strong relationship between building type and our allocated variable. This means the likelihood that a household in allocated SNAP (and is below 66k/year) is not affected by household type.
Tenure: The only statistically significant tenure type is tenure 2. The negative result indicates that those living in homes free and clear of mortgage are less likely to be allocated SNAP than those living in a house owned with a mortgage or loan. This makes sense given the objective of SNAP assistance. Though not statistically significant, it is interesting to note that tenure 4, which means occupied without payment of rent (including shelters) has a positive result. This indicates that those living in tenure type 4 are more likely to have a correlation with SNAP allocation.
Kitchen: There is, once again, a negative result for kitchen and is statistically insignifant, thus there’s no reason to comment on the result.
PUMA: The three most statistically significant PUMAS are PUMA 00106, 07501 and 07502. Both of which have positive correlation. Check the map below to see what geographical areas these PUMAs correspond to. Interestingly, the two PUMAs in San Francisco county are the ones with a positive effect size while the one in Alameda county has a negative effect size. This is incredibly meaningful as it shows those in these two PUMAs of San Francisco are more likely to be allocated SNAP. Constrastingly, those in the 00106 PUMA of Alameda are less likely to be allocated SNAP. This is also coherent with the poverty levels for these areas (higher in San Francisco than in Alameda) which are visible in our Poverty map shown below in the CalEnviroScreen section of our research.
Cardiovascular Disease in Alameda and San Francisco Counties
CalEnviroScreen measures Cardiovascular disease by emergency department visit for Cardiovascular events such as heart attack or death from heart attack.
This graph shows there is a notable difference in Cardiovascular health between San Francisco and Alameda County. Especially, the San Leandro area in Hayward with a score of 21.04.
Poverty in Alameda and San Francisco Counties
The indicator used by CalEnviroScreen for Poverty is the percent of the population with income less than two times the Federal Poverty level. The Federal Poverty level for 2021 is 26,500 dollars. We chose Poverty as an indicator because under Federal rules to qualify for SNAP, household income must be at or below 130% of the poverty threshold making this analysis valuable for our project’s goal.
In comparison to the previous map, this map shows a much more even distributed distribution of poverty households in each county. There are equally as low or high poverty levels in both areas.
The scatter plot above does not show a clear relationship, there are several outliers and the points themselves almost appear to be random.
Next, here is our model:
##
## Call:
## lm(formula = `Cardiovascular Disease` ~ Poverty, data = bay_cardio_poverty_tract)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9114 -2.7953 -0.4989 2.0243 10.7705
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.43532 0.27951 33.76 < 2e-16 ***
## Poverty 0.05148 0.01046 4.92 1.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.692 on 549 degrees of freedom
## Multiple R-squared: 0.04223, Adjusted R-squared: 0.04048
## F-statistic: 24.2 on 1 and 549 DF, p-value: 1.147e-06
As you can see, an increase of Cardiovascular Disease in one unit is associated with an increase of Poverty in 9.435; 4.2% of the variation in Cardiovascular Disease is explained by the variation in Poverty. The p-value of 1.147e-06 is <5% making these results statistically significant (also indicated by the number of *).
The graph above is a representation of the distribution of residuals from our model. While the peak is fairly close to 0, it is slightly skewed to the left and not evenly distributed on both sides.
We considered repeating these steps with a logarithmic transformation of the y axis however, our data in the scatter plot does not appear to be in the shape of a curve nor do our residuals seems to be significant skewed. Thus, we consciously have chosen to not procede with the transformation.
Next, is a graph of the residual of our linear model. Essentially, this is the same information seen in the graph above however, represented spatially.
A low residual represents a high accuracy of the modeled regression and the data collected, meaning it could be used to make reasonable estimates. As consistent with the inconclusive scatter plot, our residuals are within a large range from approx. -6 to approx. 11. This is yet another indicator of the weakness of our model.
A positive residual represents an underestimation while a negative value represents an overestimation. The negative residuals (lighter colors) are concentrated around the San Francisco and Eastern Alameda area while the positive residuals (darker colors) are in shoreline areas of Alameda County. This shows a higher concentration of overestimation in San Francisco while Alameda has a higher concentration of underestimation.
In human terms, one possible explanation for why our model underestimates this correlation when compared to the actual data, is that Cardiovascular Disease is measured in emergency visits. We suspect that this is because of systems of bias in place for these two counties. This means areas with less access to healthcare systems will have less recorded instances of cardiovascular trauma. Thus, those most affected might be missing from the data. As a consequence, it is hard to create a accurate model for the relationship of these two indicators. The opposite of this logic applies to overestimation, especially given that this analysis is of a comparative nature.
Lastly, given our inconclusive scatter plot, this analysis has proven itself not very meaningful towards drawing any significant conclusions.
The USDA defines food deserts as both low income areas and ones in which more than a third of the population at the census tract level lives over a mile from a grocery store or supermarket (10 miles for rural areas). We will be focusing only on 1 mile given that all the areas in both our counties are quantified as urban.
This map shows us the population identified as low income and low access to food options (supermarkets, groceries, and convenience stores) more than a mile away by the USDA.This map is in place of one we think would be more interesting that mapped the individual stores themselves, which is beyond the scope of this quarter. Unfortunately, as you can see from the blank tracts on the map, most of the data was NULL meaning it had no data attached to it. This makes this map difficult to anlayze as most of San Francisco and Inner Oakland have no data to interpret. That said, from what we can see, East Oakland seems to have a larger population that fits the category of low access (1 mile away) and low income. On the other hand, the 9 tracts in San Francisco which we have data for are very light coloured meaning there is a smaller population with low income and low access.
Below is a bar chart detailing the number of individuals whose households are beyond a 1 mile radius of a grocery store by race within our two counties of interest.
While we know that Alameda County has a larger population (almost double) of that of San Francisco County, the large difference in bar lengths suggests that there an even greater difference in population size. This is consistent with the map above and greaters emptiness in San Francisco indicating fewer respondeses and less data overall.
Still, we have decided to plot a second graph showing %s instead so that we can make a more fair comparison and have an opportunity to create takeaways. It is very important that we acknowledge we are making a choice to continue working with incomplete data for educational purposes.
This graph however, is very informative and shows that within both San Francisco and Alameda County the race group facing the most food access issues are white people. This isn’t surprising based on general population demographics (similar to our other equity analysis above). Second however, in San Francisco is the Black or African American community while in Alameda it is the Asian community. For next lowest access, those communities are flipped for the two counties and the only other significant race category is Two or more Races.
Next, we decided to plot the same graph however changed the radius to a half mile radius in order to compared the change in race distributions. It is important to note than population estimates for the more than a half mile radius are inclusive of those more than a 1 mile radius.
In further work, perhaps next quarter, it would be very interesting to plot the individual grocery stores on a map and layer our equity analysis on top of that to see a really clear relationship between race and food access. Our hypothesis that there was a relationship between food access and race which was mostly supported by our analyses, so it would be great to further this exploration with more tools next quarter. On the other hand, our exploration into the relationship between Cardiovascular Health and Poverty led to the result of no correlation. Lastly, our strongest results come from the logit model using ACS data specifically looking at SNAP allocation likelihood based on PUMA.
This project gave us the opportunity to delve deeper into a serious issue within the Bay Area using our fall quarter tool kit. Though much of our analysis was pretty surface level, we were still able to create meaningful results with statistical significance.
Note: Feel free to look at our .Rmd file to see additional anlyses we tried but were not significant or conclusive.